Adaptive Beamformer Based on Average Vowel / Consonant Spectrum with Phoneme Identification
نویسندگان
چکیده
For tele-conference systems or voice-controlled systems, the highquality sound capture of distant-talking speech is very important. However, background noise and room reverberations seriously degrade the sound capture quality in real acoustical environments. A microphone array is an ideal candidate for capturing distanttalking speech. With a microphone array, the desired speech signals can be acquired selectively by steering the directivity. Accordingly, a super-high directivity is necessary to reduce noise signals. To form directivity, delay-and-sum beamformer [1] and adaptive beamformers [2] [3] have been proposed as the conventional beamformers. A delay-and-sum beamformer forms the super-high directivity to the desired signal, and an adaptive beamformer forms null directivity to the noise signal. However, delay-and-sum beamformers have two serious drawbacks: the performance is not good enough to capture the desired signal without a sufficient number of transducers, and performance degrades in highly-reverberant rooms. On the other hand, adaptive beamformers can form null directivity with a small number of transducers. Furthermore, they can form sharper directivity than delay-and-sum beamformer. Consequently, adaptive beamformers are often used for the front-end processing of ASR (Automatic Speech Recognition). AMNOR (Adaptive Microphone-array for NOise Reduction) [3] is an adaptive beamformer proposed by Kaneda et al. in 1986. It promises a high quality sound-capture performance even in real acoustic environments. S-AMNOR [4] has also proposed in ICSLP2002. The S-AMNOR is the modified AMNOR based on a long time speech spectrum for capturing distant-talking speech with high quality. However, the S-AMNOR is not so suitable technique for recognizing the distant-talking speech, because speech has different characteristics as vowels and consonants. Therefore in this paper, we attempt to improve the speech recognition performance of the S-AMNOR with two adaptive filters based on vowel / consonant spectrum with phoneme identification.
منابع مشابه
Patterns of phoneme perception errors by listeners with cochlear implants as a function of overall speech perception ability.
Many studies have noted great variability in speech perception ability among postlingually deafened adults with cochlear implants. This study examined phoneme misperceptions for 30 cochlear implant listeners using either the Nucleus-22 or Clarion version 1.2 device to examine whether listeners with better overall speech perception differed qualitatively from poorer listeners in their perception...
متن کاملHow Transitions and Local Context Affect Segment Identification*
Theories on the mechanisms of phoneme identification generally involve only the actual segment and the transitions to and from neighbouring segments. In a l istening experiment we tested the importance for vowel and consonant identification of the presence of speech segments beyond the transition parts. The resu lts clearly show that identification continues to improve when speech is added beyo...
متن کاملState Space Point Distribution Parameter for Support Vector Machine Based Cv Unit Classification
In this paper we extend Support Vector Machines (SVM) for speaker independent Consonant – Vowel (CV) unit classification. Here we adopt the technique known as Decision Directed Acyclic Graph (DDAG) , which is used to combine many two class classifiers into multiclass classifier. Using Reconstructed State Space (RSS) based State Space Point Distribution (SSPD) parameters, we obtain an average sp...
متن کاملEffects of vowel context on the recognition of initial and medial consonants by cochlear implant users.
OBJECTIVE Scores on consonant-recognition tests are widely used as an index of speech-perception ability in cochlear implant (CI) users. The consonant stimuli in these tests are typically presented in the /alpha/ vowel context, even though consonants in conversational speech occur in many other contexts. For this reason, it would be useful to know whether vowel context has any systematic effect...
متن کاملEffects of phoneme repertoire on phoneme decision.
In three experiments, listeners detected vowel or consonant targets in lists of CV syllables constructed from five vowels and five consonants. Responses were faster in a predictable context (e.g., listening for a vowel target in a list of syllables all beginning with the same consonant) than in an unpredictable context (e.g., listening for a vowel target in a list of syllables beginning with di...
متن کامل